Programming paradigms |
---|
|
In computer programming, intentional programming is a collection of concepts which enable software source code to reflect the precise information, called intention, which programmers had in mind when conceiving their work. By closely matching the level of abstraction at which the programmer was thinking, browsing and maintaining computer programs becomes easier.
The concept was introduced by long-time Microsoft employee Charles Simonyi, who led a team in Microsoft Research which developed an integrated development environment (IDE) called IP that demonstrates these concepts. For reasons that are unclear,[1] Microsoft stopped working on intentional programming and ended development of IP in the early 2000s.
An overview of intentional programming is given in Chapter 11 of the book Generative Programming: Methods, Tools, and Applications.[2]
Contents |
As envisioned by Simonyi, developing a new application via the Intentional Programming paradigm proceeds as follows. A programmer first builds a toolbox specific to a given problem domain (such as life insurance). Domain experts, aided by the programmer, then describe the application's intended behavior in a WYSIWYG-like manner. Finally, an automated system uses the program description and the toolbox to generate the final program. Successive changes are done at the WYSIWYG level only, employing a system called the "domain workbench".[3]
Key to the benefits of IP is that source code is not stored in text files, but in a binary file that bears a resemblance to XML. As with XML, there is no need for a specific parser for each piece of code that wishes to operate on the information that forms the program, lowering the barrier to writing analysis or restructuring tools.
Tight integration of the editor with the binary format brings some of the nicer features of database normalization to source code. Redundancy is eliminated by giving each definition a unique identity, and storing the name of variables and operators in exactly one place. This makes it easier to intrinsically distinguish declarations from references, and the environment shows declarations in boldface type. Whitespace is also not stored as part of the source code, and each programmer working on a project can choose an indentation display of the source. More radical visualizations include showing statement lists as nested boxes, editing conditional expressions as logic gates, or re-rendering names in Chinese.
The project appears to standardize a kind of XML Schema for popular languages like C++ and Java, while letting users of the environment mix and match these with ideas from Eiffel and other languages. Often mentioned in the same context as language-oriented programming via domain-specific languages, and aspect-oriented programming, IP purports to provide some breakthroughs in generative programming. These techniques allow developers to extend the language environment to capture domain-specific constructs without investing in writing a full compiler and editor for any new languages.
A Java program that writes out the numbers from 1 to 10, using a curly bracket syntax, might look like this:
for (int i = 1; i <= 10; i++) {
System.out.println("the number is " + i);
}
The code above contains a common construct of most programming languages, the bounded loop, in this case represented by the for
construct. The code, when compiled, linked and run, will loop 10 times, incrementing the value of i each time after printing it out.
But this code does not capture the intentions of the programmer, namely to "print the numbers 1 to 10". In this simple case, a programmer asked to maintain the code could likely figure out what it is intended to do, but it is not always so easy. Loops that extend across many lines, or pages, can become very difficult to understand, notably if the original programmer uses unclear labels. Traditionally the only way to indicate the intention of the code was to add source code comments, but often comments are not added, or are unclear, or drift out of sync with the source code they originally described.
In intentional programming systems the above loop could be represented, at some level, as something as obvious as "print the numbers 1 to 10
". The system would then use the intentions to generate source code, likely something very similar to the code above. The key difference is that the intentional programming systems maintain the semantic level, which the source code lacks, and which can dramatically ease readability in larger programs.
Although most languages contain mechanisms for capturing certain kinds of abstraction, IP, like the Lisp family of languages, allows for the addition of entirely new mechanisms. Thus, if a developer started with a language like C, they would be able to extend the language with features such as those in C++ without waiting for the compiler developers to add them. By analogy, many more powerful expression mechanisms could be used by programmers than mere classes and procedures.
IP focuses on the concept of identity. Since most programming languages represent the source code as plain text, objects are defined by names, and their uniqueness has to be inferred by the compiler. For example, the same symbolic name may be used to name different variables, procedures, or even types. In code that spans several pages – or, for globally visible names, multiple files – it can become very difficult to tell what symbol refers to what actual object. If a name is changed, the code where it is used must carefully be examined.
By contrast, in an IP system, all definitions not only assign symbolic names, but also unique private identifiers to objects. This means that in the IP development environment, every reference to a variable or procedure is not just a name – it is a link to the original entity.
The major advantage of this is that if an entity is renamed, all of the references to it in the program remain valid (known as referential integrity). This also means that if the same name is used for unique definitions in different namespaces (such as ".to_string()
"), references with the same name but different identity will not be renamed, as sometimes happens with search/replace in current editors. This feature also makes it easy to have multi-language versions of the program; it can have a set of English-language names for all the definitions as well as a set of Japanese-language names which can be swapped in at will.
Having a unique identity for every defined object in the program also makes it easy to perform automated refactoring tasks, as well as simplifying code check-ins in versioning systems. For example, in many current code collaboration systems (e.g. CVS), when two programmers commit changes that conflict (i.e. if one programmer renames a function while another changes one of the lines in that function), the versioning system will think that one programmer created a new function while another modified an old function. In an IP versioning system, it will know that one programmer merely changed a name while another changed the code.
IP systems also offer several levels of detail, allowing the programmer to "zoom in" or out. In the example above, the programmer could zoom out to get a level that would say something like:
<<print the numbers 1 to 10>>
Thus IP systems are self-documenting to a large degree, allowing the programmer to keep a good high-level picture of the program as a whole.
There are projects that exploit similar ideas to create code with higher level of abstraction. Among them are: